README for LLM-Annotated Facebook Comments Dataset
Submitted to ACL 2026 Findings | Harvard Dataverse

Overview
--------
This dataset contains 47,889 rows and 51 columns of comments under Facebook political posts in the Philippines. It has been annotated using a large language model for categories such as positivity, civility, reciprocity, justification, and constructiveness, providing insights into user interaction, engagement, and the tone of political discussions on social media.

This resource is made available in conjunction with a paper submitted to ACL 2026 Findings:

    Herrera, M. J., Jaidka, K., and Luyt, B. Misinformation Commands Attention:
    An English-Tagalog Dataset of Political Discussions on Philippine Facebook Pages.
    Submitted to Findings of the Association for Computational Linguistics: ACL 2026.

Dataset Structure
-----------------
Number of Rows: 47,889
Number of Columns: 51
Language(s): English and Filipino (Tagalog)
Source: Facebook political pages (Philippines)

Key Columns:

1. Comment Metadata:
   - Post Date: The date the comment was published.
   - Content: Full textual content of the comment (English and/or Filipino).
   - Site Name and Site URL: The source platform's name and URL.
   - Channel Information: Channel name, type, and country.
   - Article/Comment: Indicates whether the row represents a Facebook comment or another type of text.

2. Sentiment and Engagement:
   - Influence Score: A metric to evaluate the comment's impact.
   - Sentiment Score: Sentiment analysis score for the comment.
   - No. of Comments: Count of replies or related comments.

3. Temporal Data:
   - Date breakdown fields:
     - YearKey, MonthOfYear, DayOfWeekName, etc.
     - QuarterOfYear and WeekOfMonth for granular temporal analysis.

4. Annotation Categories (LLM-generated, based on codebook):
   - Positive/Respectful:
     - 01_positive: Indicates whether the comment shows respect or empathy.
   - Uncivil:
     - 02_uncivil_no: Not uncivil.
     - 02_uncivil_yes_abuses_sledging: Uses abusive language or slurs.
     - 02_uncivil_yes_threatening: Contains threats or ideological extremes.
     - 02_uncivil_yes_exaggeration: Uses exaggerated arguments.
   - Reciprocity:
     - 03_reciprocity: Indicates whether the comment elicits opinions or information.
   - Justification:
     - 04_no_justification: Lacks justification.
     - 04_yes_justification_fact_based: Provides fact-based justification.
     - 04_yes_justification_personal: Contains personal feelings or experiences.
   - Constructiveness:
     - 05_not_constructive: Not constructive.
     - 05_yes_constructive_fact_checking: Includes fact-checking.
     - 05_yes_constructiveness_common_ground: Seeks common ground.
     - 05_yes_constructiveness_solution: Offers a solution.

Usage and Applications
----------------------
This dataset is particularly suited to NLP and computational social science research. Intended applications include:

1. Misinformation and Disinformation Analysis: Investigating patterns of uncivil or misleading comments in Philippine political discourse.
2. Sentiment and Tone Analysis: Studying positive, neutral, and negative sentiments in online discussions.
3. Civility and Online Behavior: Exploring the prevalence and types of uncivil behavior in code-switched text.
4. Constructive Discussion Analysis: Analyzing how comments provide justification, seek common ground, or propose solutions.
5. LLM Annotation Research: Evaluating the reliability and consistency of LLM-generated annotations for multilingual social media data.

Sample Data
-----------
| Post Date  | Content                                    | 01_positive | 02_uncivil_yes_abuses_sledging | 03_reciprocity | 04_yes_justification_fact_based | 05_yes_constructiveness_solution |
|------------|--------------------------------------------|-------------|--------------------------------|----------------|----------------------------------|----------------------------------|
| 2025-01-15 | "I appreciate this perspective, thank you" | True        | False                          | False          | True                             | True                             |
| 2025-01-14 | "You idiots don't know anything!"          | False       | True                           | False          | False                            | False                            |
| 2025-01-13 | "Can someone explain this further?"        | True        | False                          | True           | True                             | False                            |

Annotation Categories (Based on Codebook)
-----------------------------------------
1. Positive/Respectful:
   - Indicates whether the comment shows respect or empathy.
   - Example: "I appreciate your perspective!"

2. Uncivil:
   - Abuses and Sledging: Includes slurs, insults, or abusive language.
   - Threatening: Contains direct or indirect threats.
   - Exaggeration: Uses extreme, exaggerated arguments.

3. Reciprocity:
   - Indicates whether the comment invites discussion or elicits information.

4. Justification:
   - Personal: Shares personal experiences or opinions.
   - Fact-Based: Cites facts, evidence, or external sources.

5. Constructiveness:
   - Fact-Checking: Provides evidence to fact-check.
   - Common Ground: Seeks areas of agreement.
   - Solution: Proposes a constructive solution.

Important Notes
---------------
- Data Privacy: Ensure compliance with applicable privacy laws (e.g., GDPR, Philippines Data Privacy Act) when using this dataset.

- Data Cleaning: Some fields may contain null or inconsistent values and require preprocessing.

- Annotation Subjectivity: Annotations were generated by a large language model using a structured codebook. While consistent, they may not always reflect human judgment and should be validated accordingly.

Citation
--------
If you use this dataset, please cite the associated ACL 2026 Findings paper:

    Herrera, M. J., Jaidka, K., and Luyt, B. Misinformation Commands Attention:
    An English-Tagalog Dataset of Political Discussions on Philippine Facebook Pages.
    Submitted to Findings of the Association for Computational Linguistics: ACL 2026.

The dataset is hosted on Harvard Dataverse and should also be cited independently
per the repository's citation guidelines.